# **TWIN**RELECT

Twinning for excellence in reliable electronics

# D1.1

# DELIVERABLE REPORT



# D1.1 Research Plan

WP1: Joint Research





# **Document information**

| Deliverable/Title | D1.1 Research Plan                                                      | Work Package               | 1                   |
|-------------------|-------------------------------------------------------------------------|----------------------------|---------------------|
| Leading Partner   | UTH                                                                     | Participating Partner(s)   | UTH, IHP, CNRS, MAN |
| Authors           | Nikos Chatzivangelis, Georgios-Ioannis Paliaroutis, Pelopidas Tsoumanis |                            |                     |
| Editors           | Christos Sotiriou, Luigi Dilillo, Davide Bertozzi, Marko Andjekovic     |                            |                     |
| Deliverable Type  | R                                                                       | <b>Dissemination Level</b> | PU                  |
| Official          | M6 of Project                                                           | Actual Submission Date     | 31/3/25             |
| Submission Date   |                                                                         |                            |                     |

| Document history |            |                                                     |                                                            |          |  |
|------------------|------------|-----------------------------------------------------|------------------------------------------------------------|----------|--|
| Version          | Date       | Description                                         | Editors                                                    | Comments |  |
| 0.1              | 10/03/2025 | Initial Draft                                       | Nikos Chatzivangelis,<br>Georgios-Ioannis<br>Paliaroutis   |          |  |
| 0.2              | 12/03/2025 | Rought writing of some sections                     | Nikos Chatzivangelis,<br>Georgios-Ioannis<br>Paliaroutis   |          |  |
| 0.3              | 15/03/2025 | EDA Tool Flow for<br>Reliability Analysis           | Nikos Chatzivangelis                                       |          |  |
| 0.4              | 20/03/2025 | Research Objectives                                 | Nikos Chatzivangelis,<br>Georgios-Ioannis<br>Paliaroutis   |          |  |
| 0.5              | 25/03/2025 | Collaboration and<br>Coordination Among<br>Partners | Georgios-Ioannis<br>Paliaroutis<br>Pelopidas Tsoumanis     |          |  |
| 0.6              | 28/03/2025 | Publications Strategy                               | Luigi Dilillo, Davide<br>Bertozzi, Nikos<br>Chatzivangelis |          |  |
| 0.7              | 28/03/2025 | Potential Risks                                     | Luigi Dilillo, Nikos<br>Chatzivangelis                     |          |  |
| 0.9              | 30/03/2025 | Unreviewed Final<br>Version                         | Nikos Chatzivangelis,<br>Georgios-Ioannis<br>Paliaroutis   |          |  |
| 1.0              | 31/03/2025 | Final Version                                       | All Authors & Editors                                      |          |  |

## DISCLAIMER

Funded by the European Union (Grant Agreement № 101160314). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.







# Contents

| 1. Introduction4                                                                |  |
|---------------------------------------------------------------------------------|--|
| 1.1 Fault Effects on Electronic Devices 4                                       |  |
| 2. EDA Tool Flow for Reliability Analysis                                       |  |
| 2.1 Tools Involved in the Reliability Analysis Tool Flow                        |  |
| 2.1.1 PredicSEE7                                                                |  |
| 2.1.2 ECORCE                                                                    |  |
| 2.1.3 UPSET                                                                     |  |
| 2.1.4 EMBER9                                                                    |  |
| 2.2 Reliability Analysis Methodology10                                          |  |
| 2.2.1 Reliability Analysis at Technology Level11                                |  |
| 2.2.2 Reliability Analysis at Design Level12                                    |  |
| 2.3 Flow Integration and Tools Interaction12                                    |  |
| 2.3.1 Device-Level and Transistor-Level Interaction12                           |  |
| 2.3.2 Gate-Level and System-Level Communication13                               |  |
| 2.3.3 Data Flow and Communication Between Tools13                               |  |
| 3. Research Objectives15                                                        |  |
| 3.1 Characterization and Modeling of Fault Effects15                            |  |
| 3.2 Analysis of Fault-Tolerance Techniques and Definition of Evaluation Metrics |  |
| 3.2.1 Characterization of fault tolerance mechanisms17                          |  |
| 3.2.2 Definition of reliability metrics17                                       |  |
| 3.3 STA-based Fault Analysis18                                                  |  |
| 3.4 STA-based Optimizations19                                                   |  |
| 3.4.1 Optimization Techniques19                                                 |  |
| 3.4.2 Closed-loop Optimization 20                                               |  |
| 3.5 Benchmarking and Scalability Testing21                                      |  |
| 4. Collaboration and Coordination Among Partners23                              |  |
| 4.1 Joint Research Through Long-term Training of ESRs23                         |  |
| 4.2 Joint Irradiation Experiments                                               |  |
| 5. Publications Strategy26                                                      |  |
| 5.1 Identified Key Research Topics & Trends26                                   |  |
| 5.2 Target High-Impact Journals & Conferences27                                 |  |
| 6. Potential Risks                                                              |  |
| 6.1 Coping with Technical Risks                                                 |  |
| 6.2 Coping with Logistical & Operational Risks                                  |  |
| 6.3 Coping with Financial & Resource Risks                                      |  |
| 7. Conclusions                                                                  |  |







# 1. Introduction

The increasing demand for high-performance, low-power, and compact integrated circuits (ICs) has led to significant advancements in semiconductor technologies. However, these same advancements introduce critical challenges in ensuring long-term reliability of electronic systems, especially in radiation-prone or mission-critical environments. As technology scales to sub-22nm nodes and the complexity of System-on-Chip (SoC) architectures rises, the vulnerability of digital circuits to both transient (e.g., soft errors, SEEs) and permanent faults (e.g., aging, TID effects) grows considerably.

The TWIN-RELECT project addresses this challenge through a cross-layer EDA research and training initiative, aiming to build capacity and foster excellence in the design of reliable electronics. Work Package 1 (WP1) focuses on joint research efforts to develop and integrate an end-to-end EDA tool flow for reliability analysis and enhancement, encompassing technology modeling, fault simulation, timing analysis, and design optimization. The tools and methodologies developed in WP1 will be validated across a wide range of benchmarks, including advanced AI accelerators, space-grade microcontrollers, and asynchronous logic blocks.

The research is grounded in the belief that reliability must be addressed across all abstraction levels — from physical device behavior under radiation, to transistor-level characterization, gate-level timing analysis, and finally system-level functional robustness. To this end, the project proposes a comprehensive, modular EDA tool flow, integrating tools such as PredicSEE, ECORCE, SPICE, UPSET, and EMBER (as illustrated in Section 2).

This deliverable outlines a comprehensive research plan to be executed within WP1. It begins by introducing the TWIN-RELECT project objectives (Section 1.1) and the overall research approach (Section 1.2). Section 2 provides a detailed description of the EDA tool flow, covering each layer of analysis from device to system level, and the process for fault modeling, simulation, optimization, and benchmarking. Sections 3 through 6 cover research objectives, collaborative strategy among partners, publication strategy, and risk management procedures to ensure successful implementation.

This deliverable serves both as a technical blueprint for the tool flow's development and as a strategic roadmap for the collaborative research activities that will shape the future of reliable electronics design within the TWIN-RELECT consortium.

## 1.1 Fault Effects on Electronic Devices

Critical systems performance and reliability can be significantly affected by various factors, especially in radiation harsh environments. Radiation-induced disturbances can cause transient or permanent faults, affecting the functionality of applications utilized in automation, aerospace, and medicine. Therefore, analyzing the radiation effects, including **Single Event Effects (SEEs)**, **Total Ionizing Dose (TID)**, and device **aging**, is necessary for designing more resilient semiconductor technologies and enhancing the reliability of electronic systems.

The first type of fault effect is SEEs, which primarily occur due to ionizing radiation, such as high-energy particles and cosmic rays, leading to transient or permanent faults in semiconductor devices. These effects can considerably impact the reliability of contemporary ICs, posing significant challenges to the







functionality of applications in sectors such as automotive, aerospace, and medical fields, where radiation exposure is constant. Therefore, more resilient semiconductor technologies can be designed by accurately modeling these effects, thereby ensuring the electronic systems reliability. **SEEs** are generally classified into **soft errors** and **permanent errors**:

- Soft Error (Transient Error)
  - A soft error occurs when radiation (such as cosmic rays or alpha particles) temporarily disrupts an electronic component, such as a memory cell or logic circuit, without causing permanent damage.
  - It is a **non-destructive** error, meaning the system can recover from it (e.g., by refreshing memory or error correction techniques).
  - Example: A **Single Event Upset (SEU)** in which a radiation particle flips a bit in a memory cell but does not physically damage the hardware.
- **Permanent Error** (Hard Error)
  - A permanent error happens when radiation **physically damages** an electronic component, leading to irreversible failure.
  - This could be due to **Single Event Latch-up (SEL)** (a short circuit that permanently disables a circuit), **Single Event Gate Rupture (SEGR)** (damage to a transistor's gate oxide). The affected hardware usually requires **replacement** or **repair**.

Extended exposure to ionizing radiation in a system can cause **Total Ionizing Dose (TID)** effects, leading to charge buildup in the oxide layers of semiconductor devices. Increased leakage currents, threshold voltage shifts, and generally gradual degradation of transistor performance are some indicative impacts that can lead to severe failure or malfunctions if not appropriately mitigated. Electronic systems utilized in space are mainly prone to TID effects, making it crucial to ensure their functionality and reliability.

Device **aging** is a significant factor affecting the long-lasting functionality of electronic systems. This type of effect gradually changes threshold voltage (Vth), increasing circuit delays and potential failures due to mechanisms, including Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI). The proper way to investigate these effects is to conduct experiments exposing targeted circuits to increased supply voltages and elevated temperatures, monitoring factors such as power consumption and error rates. This procedure will contribute to evaluating critical devices' reliability.







# 2. EDA Tool Flow for Reliability Analysis

In this section, a detailed description of the **cross-layer EDA tool flow for reliable electronics design** is provided, along with the **research tasks** surrounding this effort. Despite significant progress in fault modeling and reliability evaluation, existing EDA solutions exhibit several critical limitations:

- **Isolated Fault Modeling**: Most existing approaches analyze reliability threats in isolation, lacking support for combined or simultaneous effects (e.g., SETs, SEUs, aging, TID, EMI).
- Limited Scalability: Current methodologies do not scale efficiently to large-scale, real-world designs that incorporate millions of transistors and diverse functional units.
- **Insufficient Delay Modeling**: Fault propagation flows often overlook gate and interconnect delays, especially from a timing analysis perspective critical for soft error masking.
- Lack of Cross-Layer Integration: Available flows rarely support consistent analysis across multiple abstraction levels (device, gate, system), limiting their applicability to modern complex SoCs.
- Inadequate Support for AI/ML Hardware: Tools specialized in analyzing AI accelerators, asynchronous NoCs, and other emerging architectures are currently lacking.
- Limited Commercial Tool Capabilities: Existing commercial tools (e.g., Cadence IFSS, IROC TFIT/SoCFIT) either ignore key physical parameters (like voltage and temperature) or are restricted to specific fault types (e.g., radiation-only soft errors), without offering support for design-level fault tolerance optimization.

The objective of TWIN-RELECT's EDA tool flow is to address the limitations of existing reliability analysis approaches through a holistic, scalable, and modular framework that bridges the gap between **fault analysis and design-for-reliability**. By integrating accurate fault modeling, from **device-level physical simulations** up to **system-level fault injection and evaluation**, the flow enables consistent and comprehensive reliability assessment across the entire design stack. In addition, the tool flow incorporates **fault mitigation techniques**, such as timing-aware optimization strategies, allowing designers not only to analyze but also to actively improve the resilience of electronic systems through structural and physical design transformations. This cross-layer integration allows models to be reused, fault effects to be tracked, and fault tolerance to be measured and improved across a wide range of hardware designs, including processors, memory, Al accelerators, and other system components.

## 2.1 Tools Involved in the Reliability Analysis Tool Flow

The key strength of the TWIN-RELECT tool flow is the integration of four cutting-edge software tools provided by the project partners into an EDA tool flow for analyzing and designing reliable circuits. These tools are detailed in the following subsections.







#### 2.1.1 PredicSEE

PredicSEE is a Monte Carlo based simulation tool, developed in CNRS, which predicts the SEU cross-section of electronics structures such as memory cells, logic gates, and little circuits.

- Inputs: It provides a straightforward interface with input parameters familiar to the end users, whereas it varies the thickness of the depletion layer dynamically based on the potential at the node. Furthermore, the Monte Carlo simulations are utilized to inject the selected particles randomly across the device area. The simulation ends when one of the stopping criteria is reached: Monte Carlo accuracy or the fluence of particles. To reduce the CPU time dependence, the PredicSEE tool uses the DHORIN code that provides a wide range of particle–matter interactions pre-calculated for protons and neutrons. For the ions, primary and secondary ion transport and their ionization are obtained through the SRIM. Ions, neutrons, and protons can be considered, and their resulting ionization (direct and indirect) is simulated using the diffusion–collection model, simplifying the collection and transport of carriers after the interactions.
- **Outputs**: The provided results show good agreement with the experimental data, showing that we can accurately predict the SEU mechanism. The number of parameters can be generally reduced and simplified in each modeling phase, using available or approximate values based on the literature. The objective of PredicSEE is to estimate the cross-section in order of magnitude, even without access to all parameters and characteristics of the device.
- Characterisation Results, associating particle energy to injected current as a function of time, *i.e.* I(t) waveforms

## 2.1.2 ECORCE

ECORCE is a TCAD simulation tool based on a classical drift-diffusion model and is distributed under the GPL, developed in CNRS. It facilitates TCAD modeling by providing an easy-to-use graphical user interface for defining the geometry and physical model of the devices, executing calculations, and analyzing results. Furthermore, by integrating a dynamic mesh generator, ECORCE frees the user from the meshing step for either DC or transient analysis. Also, this tool allows Single Event Effect (SET) modeling and features a restricted diffusion add-on that accounts for the kinetics of the trapping-detrapping process in insulators. Finally, ECORCE models the Single Event Effect and Total Ionizing Dose response of MOS devices. Experimental results are modeled quantitatively.

- Inputs: It handles PN junctions, bipolar transistors, MOS transistors, IGBTs, and CMOS APS image sensors. It can be operated in 1D, 2D, and axisymmetric modes for both DC and transient modeling. ECORCE includes a model for SEE and a restricted diffusion add-on that deals with charge trapping-detrapping in insulators to account for dose effects in electronic components. Also included is a dynamic mesh generator that frees the user from the time-consuming meshing step for either DC or transient analysis.
- <u>Outputs</u>: lons induce numerous effects in materials; thus, ECORCE accounts for the electron/hole pairs generated along the ion track. To model the ionization, this tool needs a set of points (range, generated pairs) along the track. These values are deduced from the Linear







Energy Transfer (LET; generally expressed in MeV\*mg-1 \* cm-2 ) defining the energy deposited by an ion in a material.

• Characterisation Results, associating particle energy to injected current as a function of time, i.e. I(t) waveforms

#### 2.1.3 UPSET

UPSET is a Single Event Transient (SET) Analysis tool, developed by UTH, supporting both SET Generation and Propagation for whole circuits, based on Static Timing Analysis (STA). STA emulates signal propagation, from all timing path start points, through combinational logic to circuit endpoints, by propagating both rise/fall delays and slews and supports both the standard NLDM (Non-Linear Delay Model), as well as the more advanced CCS (Composite Current Source) timing model. By utilising UTH's STA engine for SET generation and propagation, UPSET is able to model transient faults, originated by particle strikes, (i) with acceptable loss of accuracy over SPICE, and (ii) using Static analysis over simulation, resulting in many orders of magnitude of speedup. A key strength of the tool is its support for widely-used industry formats, such as Verilog, SPEF, and LEF, enabling seamless integration with existing industrial tools and workflows. This compatibility allows for efficient collaboration with established EDA environments and ensures that UPSET can be readily adopted in commercial design processes. The inputs and outputs formats are detailed below.

- Inputs:
  - **Verilog Netlist (.v):** The Verilog netlist describes the circuit's structure, including the interconnections between components (gates, flip-flops, etc.). It serves as the primary input for UPSET, allowing the tool to perform SET propagation analysis across the circuit.
  - **Liberty Format Timing Library (.lib):** This library contains timing models for the components in the design. It provides essential timing characteristics, such as delay values and slew rates, that are needed for Static Timing Analysis (STA) during fault propagation.
  - **Library Exchange Format (.lef):** The LEF file defines the physical layout information, such as the size of the cells and the technology used in the design. It is used in conjunction with the netlist and timing library to ensure the timing analysis takes into account the physical characteristics of the components.
  - **Layout in DEF form (.def):** The DEF file contains detailed information about the layout, such as cell placement and routing. It provides the necessary physical data to understand how components are arranged, which is crucial for accurate fault analysis.
  - **Metal RC Parasitics in SPEF form (.SPEF)** *(optional)*: This file contains RC parasitic information (resistance and capacitance of metal interconnects), which can be used to improve the accuracy of the timing analysis, particularly for interconnect delays.
  - **Switching Activity Interchange File (.SAIF)** *(optional)*: This file provides information on the switching activity of the design, which is used to estimate the likelihood of faults occurring in different paths and determine the impact of SETs.







• **Tool Command Language Scripts (.tcl):** TCL scripts are used to automate and customize the execution of UPSET. These scripts can control the analysis flow and set up specific parameters for the tool's operation.

#### • Outputs:

- **SET Analysis Report:** The primary output of UPSET is the SET analysis report, which details how Single Event Transients (SETs) propagate through the design. This report helps identify vulnerable paths in the circuit that may be susceptible to transient faults caused by radiation or other environmental factors.
- **Radiation Hardened Design (WIP):** This represents the design's resilience against radiation-induced faults after undergoing fault mitigation strategies. This is part of the ongoing effort to improve the circuit's reliability in harsh environments and consists of two main files:
  - Modified Verilog Netlist (.v): After fault analysis and optimization, the tool outputs a modified Verilog netlist, reflecting the changes made to enhance fault tolerance. This netlist can then be used for further simulation or implementation.
  - Modified Layout in DEF form (.def): UPSET can also output a modified layout in DEF form if physical changes are made during the optimization phase. This updated layout ensures that the circuit's physical configuration is aligned with the changes made for fault tolerance.

#### 2.1.4 EMBER

EMBER (Extensible Microarchitectural Benchmark for Error Resilience) is a C++ class library for fault injection at the register transfer level (RTL), balancing hardware modeling accuracy and simulation speed. Jointly developed by the University of Manchester and IHP, it features cycle-based simulation and built-in fault injection support for parametric designs. More specifically, the assumption of zero gate delays (i.e., ignoring gate delays) enables cycle-based hardware simulations, which expose better performance than traditional event-driven simulations. EMBER targets the fast resilience analysis of highly configurable architectures by extending the traditional RTL simulation tool flow with a "design initialization" phase, allowing different parameter settings without recompiling the simulation. EMBER uses C++ meta-programming to separate datapath descriptions from unit implementations, enabling flexible design adaptation. Unlike traditional fault injection, which uses hardware or software saboteurs, EMBER uses mutant components that change behavior under fault injection, eliminating the need for hardware/software customizations to insert faults and the flexible implementation of different fault models.

- Inputs:
  - **RTL description of an architectural component** (e.g., a deep learning accelerator). The RTL specification can be directly entered in EMBER by inheriting abstract classes and







implementing their abstract methods. Alternatively, the HDL model can be converted through state-of-the-art tools such as Verilator into behavioural cycle-accurate C++ models, which are easily wrapped for EMBER integration due to the common use of the cycle-based simulation paradigm.

- *Fault model*. From the characterization at lower layers of the design stack, a fault model compatible with the modelling abstraction of EMBER can be defined using mutant components.
- For specific fault models (e.g., stuck-at faults), the RTL description can be selectively refined by exposing the *gate-level structure of a circuit subblock*. Standard gate-level netlists can be automatically imported in EMBER, where they are rendered through a library of zero-delay logic gate models.
- Outputs:
  - **Fault masking probability**. It is the likelihood that a fault introduced into the system will not lead to a detectable error or failure due to the system's inherent design or recovery mechanisms.
  - *Hardware simulation model suitable for fast Python integration*. The C++ nature of the model enables its straightforward integration into state-of-the-art deep learning frameworks, thus allowing it to propagate the effects of faults to the final application (e.g., degradation of the inference accuracy and/or of the score margin).

## 2.2 Reliability Analysis Methodology

In order to allow for a cross-layer reliability analysis, the EDA tool flow, has been structured into two major domains:

- **Technology Analysis**, which focuses on device-level and transistor-level modeling of fault effects.
- **Design Analysis**, which enables gate-level timing-aware fault analysis and system-level resilience assessment.

A graphical representation of the overall flow is shown in <u>Figure 1</u>, illustrating the interactions between the tools and the flow of data across abstraction levels. This integration ensures that **early-stage device-level insights** (e.g., from TCAD and Monte Carlo simulations) directly inform later stages of **design analysis and optimization**.











## 2.2.1 Reliability Analysis at Technology Level

The **technology-level analysis** represents the foundational layer of the TWIN-RELECT EDA tool flow and is divided into two sub-levels: **device-level** and **transistor-level** analysis.

• **Device-level analysis** is the lowest abstraction layer, providing a highly accurate representation of the physical mechanisms governing fault generation. It models the interaction of high-energy







particles with semiconductor materials, accounting for effects such as ionization, charge transport, and trapping. At this stage TCAD and Monte Carlo Simulation techniques are used to simulate phenomena like SETs, SEUs, and TID.

• **Transistor-level analysis** is the next abstraction layer, modeling the electrical behavior of standard cell transistors under various reliability stressors, including both soft errors (e.g., SETs, SEUs) and permanent degradation due to TID or aging effects. This stage will involve SPICE simulations to accurately characterize the effects of fault generation at standard cell level. The characterization procedure will also evaluate gate output delay and slew response evaluation under fault and aging conditions, providing essential data for higher-level reliability analysis.

#### 2.2.2 Reliability Analysis at Design Level

The **design-level analysis** is similarly structured into two sub-levels: **gate-level** and **system-level**.

- **Gate-level analysis** abstracts the detailed electrical behavior of individual transistors into timing models, enabling scalable fault analysis of entire circuits. This stage uses Static Timing Analysis (STA), to evaluate how faults, such as SETs, SEUs, or aging-induced delays propagate through logic paths. The analysis identifies vulnerable gates and critical paths, and sets the foundation for reliability driven optimization and system-level evaluation.
- System-level analysis is the highest abstraction layer of the analysis operating at the microarchitecture level, targeting full systems comprising CPUs, AI accelerators, NoCs, or asynchronous blocks. System-level analysis forms the link between low-level reliability evaluation and the application domain, allowing designers to understand how faults impact actual workloads. It also helps validate gate-level optimizations and ensures that reliability improvements translate into better system-level robustness.

## 2.3 Flow Integration and Tools Interaction

The TWIN-RELECT EDA Tool Flow is designed to provide a seamless, integrated environment for evaluating and improving the reliability of electronic systems. This flow is structured to incorporate multiple abstraction levels, from device-level up to system-level analysis. The integration of tools at each level ensures that fault modeling, analysis, and mitigation techniques are consistently applied and that data flows smoothly from one stage to the next. In this section, we describe how the key components of the flow interact with one another.

## 2.3.1 Device-Level and Transistor-Level Interaction

The device level analysis, involves **PredicSEE** and **ECORCE** tools, which simulate the physical interactions between high-energy particles and semiconductor materials, focusing on effects like Single Event Upsets (SEUs), Single Event Transients (SETs), and Total Ionizing Dose (TID). The data generated from these simulations, including current waveforms and ionization effects, form the foundational characterization data used by the SPICE-based transistor-level analysis.







**SPICE simulations**, of the transistor level analysis, build upon this device-level data to model the electrical behavior of standard and custom cells under fault conditions. LookUp Tables (LUTs) are generated, capturing the fault generation and timing of individual gates. These LUTs then serve as the primary data input for the gate-level analysis. The interaction between these two stages is crucial, as it ensures that realistic, technology-specific fault effects are propagated into the higher-level analysis tool.

#### 2.3.2 Gate-Level and System-Level Communication

The gate-level analysis is where the tool flow shifts from individual transistor modeling to evaluating the behavior of logic gates under fault conditions. At this stage UTH's tool, **UPSET**, performs STA-based fault analysis, leveraging the LUTs generated from device or transistor level simulations. This analysis assesses how faults, such as SETs and SEUs, propagate through combinational logic paths. The results help identify vulnerable gates and critical paths, which may require fault mitigation strategies. Building upon these findings, UPSET performs STA-based optimizations to improve the fault tolerance of the design. During this process, the tool carefully balances critical Performance, Power, and Area (PPA) metrics, ensuring that the optimized design not only improves reliability but also meets the required performance and resource constraints. These optimizations are iterative and guided by the results of the fault analysis, ensuring that fault mitigation does not compromise the overall design efficiency.

The system-level analysis bridges the gap between reliability evaluation and real-world application. In this stage, the **EMBER** tool developed by MAN/IHP performs cycle-accurate fault injection at the RTL or gate level to assess fault masking probabilities, fault coverage, and the overall robustness of the system under realistic operating conditions. Using realistic testbenches or input vectors, EMBER simulates how faults manifest within the system and determines whether they are masked or result in failures during actual workloads. This analysis provides critical insights into the design's resilience in real-world scenarios and helps validate the effectiveness of fault-tolerant mechanisms applied in earlier stages.

The interaction between the **gate-level analysis** and **system-level evaluation** is key to ensuring that faults identified at the gate level are correctly tested within the context of real-world applications. The **system-level analysis** provides critical feedback to earlier stages by identifying **system-level vulnerabilities**, validating the effectiveness of mitigation strategies, and guiding further optimizations.

#### 2.3.3 Data Flow and Communication Between Tools

The flow between tools is designed to be **automated and seamless**, ensuring that data is passed smoothly from one tool to the next without manual intervention. Here's how the data flows between the tools:

 Device-level to Transistor-level: The data generated by PredicSEE and ECORCE feeds directly into SPICE simulations. These data include current waveforms, ionization effects, charge collection dynamics and threshold voltage shifts that occur during events such as SETs, SEUs and TID. These results are used to generate LUTs that describe the timing and reliability characteristics of standard cells. A key strength of the proposed interaction is that it allows modelling combinations of faults, like SET/SEU generation and propagation under TID or aging effects.







- 2. Transistor-level to Gate-level: The LUTs generated from SPICE simulations are passed to the UPSET tool, where they are used for STA-based fault analysis. This data helps model how SETs and SEUs generate and propagate through logic gates and identify potential vulnerabilities in timing paths. By employing the data from the technology level analysis into the gate level analysis, the flow ensures that accurate fault models are passed into higher abstraction layers, minimizing the risk of inaccurate fault propagation. Furthermore, the characterization data provides essential information for the reliability optimization algorithm, regarding the overall resilience of each standard cell. This guides the optimization process, by influencing the decisions that need to be made, during that phase, to enhance the design's fault tolerance.
- 3. **Gate-level to System-level**: The results from UPSET—including identified critical paths and fault propagation data—are handed off to EMBER, which performs cycle-accurate fault injection using RTL or zero delay gate-level models. This enables testing of the fault effects under real-world workloads, assessing the likelihood of faults propagating to the system's output. Additionally, EMBER will be used to evaluate the effectiveness of STA-based optimizations by simulating how the applied fault-tolerant techniques influence the system's overall robustness against faults. This evaluation ensures that the optimizations implemented at the gate level maintain their effectiveness in the context of full system operation. The results of the optimized design analysis from EMBER will then be fed back to UPSET for further optimization if necessary, ensuring that any remaining vulnerabilities are addressed and that the final design achieves the desired fault tolerance without compromising other performance metrics.
- 4. **System-level to Gate-level**: The system-level analysis can also serve as input for the gate-level analysis. EMBER fault injection RTL simulations can highlight critical portions of the design that may require more detailed analysis. This interaction enables a more targeted workflow, as these vulnerable areas can be directly targeted for gate-level analysis and optimization without the need to perform a sweeping gate-level analysis beforehand. This targeted approach ensures that fault mitigation strategies are applied precisely where they are needed most, enhancing the overall reliability of the design without unnecessary analysis of non-critical areas.







# 3. Research Objectives

The main objective of the TWIN-RELECT project is to advance the reliability of electronic systems, particularly in environments prone to radiation and other fault-inducing factors. This section outlines the primary research objectives that will guide the development of fault characterization, fault tolerance mechanisms, and optimization strategies that will be integrated into the EDA tool flow. The focus is on creating reliable and robust designs through comprehensive fault modeling and analysis, spanning various abstraction levels, from technology and transistor-level analysis to design-level optimizations. In addition, the project aims to define and evaluate new fault tolerance metrics, develop innovative optimization techniques, and benchmark the effectiveness of these solutions across a range of systems and architectures. Through these objectives, the project will ensure the long-term reliability of modern electronic systems, with particular attention given to emerging applications such as AI accelerators and space-grade electronics.

## 3.1 Characterization and Modeling of Fault Effects

The modeling and characterization of fault effects in state-of-the-art semiconductor technologies are crucial for ensuring the reliability of Integrated Circuits (ICs), considering different technology nodes and design solutions, especially in radiation-prone environments. The accurate design of fault models to analyze the impact of transient and permanent faults from the device to the circuit level, i.e., across different levels of abstraction, is the primary target of this procedure. In particular, the project's research focuses on modeling Single Event Effects (SEEs) and Total Ionizing Dose (TID) effects, utilizing both simulation tools and irradiation reliability experiments. UTH's contribution to all tasks described below will be significant, providing support in simulations, experiments, and the analysis of results, enhancing the overall research efforts.

#### <u>Characterization and modeling of fault effects in devices and circuits designed in advanced processes</u> (Leader: CNRS, Participants: All)

Fault effect modeling and characterization is a crucial process in the analysis of ICs and advanced semiconductor devices, led primarily by CNRS. The main goal is to evaluate the impact of radiation effects on both custom-designed devices and COTS IC devices designed in advanced technology nodes below 65nm. The reliability of ICs can be considerably affected in radiation-intensive environments; thus, this procedure plays a crucial role in ensuring their reliable long-term functionality and robustness. At the transistor level, radiation effects will be analyzed, utilizing the in-house TCAD simulator ECORCE, whereas the PredicSEE tool based on Monte Carlo simulation will be used to model SEs and SELs in both small circuits and complex electronic designs. These tools will contribute to predicting radiation effects and optimizing reliability assessment methodologies.

Finally, irradiation test campaigns with particle accelerators will play a pivotal role in the experimental validation of the model employed for characterizing radiation effects. Specifically, these tests will provide crucial data to enhance prediction accuracy and refine simulation methodologies, ensuring more precise reliability modeling for modern electronic designs.







#### Characterization and modeling of fault effects in standard cells (Leader: IHP, Participants: All)

The impact of SEs and aging effects on standard cells will be examined by IHP using SPICE simulations at the transistor level. To provide a more comprehensive evaluation of fault impact in semiconductor devices, data extracted from TCAD simulations will be utilized as inputs for this stage of the analysis, whereas a variety of technologies, including IHP's 130nm and 250nm technologies, as well as scaled technology nodes of 22nm and 28nm, will be used to offer a detailed analysis. In particular, to analyze how logic gates respond to both single and combined radiation effects, a series of SPICE simulations will be performed at the transistor level. Through this process, the vulnerability of the gates will be assessed, taking into account their behavior under various designs and operational conditions as well as the combined impact of multiple effects.

The Look-Up Tables (LUTs) generated from the characterization procedure through simulation results will be utilized at the gate level by the UPSET tool to perform STA-based fault analysis. At the same time, they will be leveraged to train Artificial Intelligence (AI) models designed to predict fault effects at the gate level. Specifically, the results from CNRS's device-level analysis will be integrated by IHP to automate the sensitivity evaluation of complex designs, thereby enhancing the accuracy of the models.

#### <u>Characterization and modeling of fault effects in application-specific cells (Leader: MAN, Participants:</u> <u>All)</u>

The reliability evaluation of AI/ML hardware and specialized logic cells for accelerated inference in artificial and spiking neural networks (ANNs and SNNs) will be performed by MAN. This evaluation is crucial for ensuring robust operation in critical applications. The analysis will consist of two phases. In the first phase, the reliability of sensitive architectural components will be investigated to identify fault effects for architectural-level analysis. In neuromorphic computing, multi-core processors handle neuro-synaptic processing, with the asynchronous network-on-chip (NoC) as a single point of failure. Critical components such as asynchronous pipelines, arbiters, and FIFOs will be characterized. In deep learning accelerators, the focus will be both on the MAC array, correlating its reliability to the architectural configuration (e.g., number of MAC units, multipliers per unit), and on exploring the accelerator control logic. All above components play a fundamental role in the architecture and operation-induced effects. In the second phase, the focus will be on the system-level architecture. Asynchronous NoC faults will be applied to neuromorphic processors while running spiking neural network simulations, while tensor error geometries and classification accuracy in deep learning frameworks will be analyzed when running on faulty hardware.

The generation and propagation analysis of Single Event Transient (SET) and Single Event Upset (SEU), i.e., disturbances caused by radiation-induced effects, will cover multiple abstraction layers based on electrical and gate-level simulations. The primary target is to analyze how disturbances propagate through components, extending the reliability evaluation from individual logic cells to architectural building blocks. This analysis will be achieved by utilizing SPICE simulation and based on data obtained from TCAD-based characterizations to improve accuracy. As a result, an accurate model for the robustness of AI hardware and digital systems will be established through the integration of cross-layer fault analysis. The goal is to provide reliable testing methodologies that enhance the reliability of modern semiconductor technologies in radiation environments.







# 3.2 Analysis of Fault-Tolerance Techniques and Definition of Evaluation Metrics

## 3.2.1 Characterization of fault tolerance mechanisms

In this task, various fault tolerance mechanisms will be studied at both gate and circuit levels, analyzing their effectiveness in mitigating SETs, SEUs, and aging effects while minimizing design overhead. The fault tolerance mechanisms that will be studied in this task are listed in <u>section 3.4.1</u>. A key aspect of this procedure is the detailed characterization of mitigation techniques based on the analysis conducted by the PredicSEE tool and SPICE simulation. The modeling will specifically focus on complex standard cells, taking into account the impact of supply voltage and temperature. Furthermore, the applicability of these techniques on a real design or at least a reasonably complex circuit will be demonstrated by utilizing different semiconductor technologies to evaluate the effectiveness of these approaches across various technology nodes.

The fault tolerance mechanisms will be assessed using both SPICE and RTL simulations to provide a detailed analysis of their behavior at multiple abstraction levels, whereas reference methodologies from prior work published by IHP will be considered to ensure a robust analysis. The ultimate objective is to determine the optimal combination of mitigation techniques with minimal design overheads, thereby enhancing the reliability of modern electronic systems.

## 3.2.2 Definition of reliability metrics

The utilization of well-defined reliability metrics is required to accurately evaluate IC vulnerability and assess the fault tolerance mechanism's effectiveness. Existing metrics, such as Soft Error Rate (SER), Cross-Section, Failures in Time (FIT), and Mean Time to Failure (MTTF), provide valuable insights into system reliability. However, these metrics do not take into consideration the specific characteristics of different designs and fault types and do not thoroughly evaluate the impact of mitigation techniques. The above type of metrics will be investigated to overcome these limitations:

- 1. **Fault sensitivity metrics:** Evaluation of the gate and sub-circuit vulnerability to transient and permanent faults.
- 2. **Fault tolerance metrics:** The effectiveness of mitigation approaches in reducing system vulnerability will be assessed.
- 3. **Processing Radiation Reliability Metric (PRRM):** A structured approach that provides a more comprehensive and comparable evaluation of processing system reliability under radiation exposure.

#### Key Objectives of Fault Sensitivity Metrics

- Identify the most susceptible gates and sub-circuits of a circuit to transient and permanent faults based on their functional and structural characteristics.
- Modeling the propagation of the faults through gates and sub-circuits, assessing their impact on overall circuit behavior.
- Evaluate the vulnerability of components to different process variations, including aging and manufacturing defects, and how these factors increase their susceptibility to radiation effects.







• Determine how gates and sub-circuits can withstand radiation disturbances and maintain correct functionality after the impact of transient or permanent faults.

#### **Key Objectives of Fault Tolerance Metrics**

- Evaluate the fault tolerance mechanisms to detect faults in real-time and prevent system failures.
- Assess the effectiveness of various mitigation strategies in improving IC reliability by reducing vulnerability to radiation-induced faults.
- Analyze the trade-offs between radiation robustness and overheads in terms of performance, power, and area (PPA).
- Implement fault tolerance strategies to different system configurations and technology nodes, ensuring their scalability across diverse architectures.

#### Key Objectives of PRRM

- Standardize radiation reliability reporting for high-performance processors.
- Integrate error cross-sections and performance benchmarking to assess system robustness.
- Define a clear error classification framework based on error criticality and impact.
- Ensure fair comparisons across different platforms by establishing eligibility criteria for benchmarking.

The utilization of the aforementioned approaches will provide a comprehensive framework for assessing and enhancing IC reliability and robustness. The evaluation of the vulnerability of the components and the effectiveness of mitigation strategies, along with standardized reporting through PRRM, ensures more accurate, comparable, and scalable evaluations across various system architectures. Finally, these reliability metrics utilized will provide useful insights that contribute to improving the reliability of processing systems in radiation-prone environments, facilitating the design of more resilient and efficient electronic technologies.

## 3.3 STA-based Fault Analysis

As part of the TWIN-RELECT research objectives, the UPSET tool developed by UTH will undergo significant enhancements to provide a fast, accurate, and scalable Static Timing Analysis (STA)-based fault analysis framework. This tool will support a broad range of fault mechanisms, including Single Event Transients (SETs), Single Event Upsets (SEUs), as well as permanent degradation effects like Total Ionizing Dose (TID) and aging. Furthermore, to offer a more comprehensive reliability analysis, the tool will be extended to support combinations of multiple fault effects, enabling more realistic evaluations of system resilience.

For fault generation, UPSET will integrate results from technology-level characterization (e.g., PredicSEE, ECORCE, and SPICE) to enable accurate SET waveform and SEU generation using pre-characterized Lookup Tables (LUTs). With respect to aging analysis, UPSET will be extended to model long-term degradation effects. Delay derates for standard cells will be derived from technology-specific LUTs and analytical models, such as transistor threshold voltage shifts. Moreover, Machine Learning (ML) techniques will be explored to enhance the accuracy and scalability of SET/SEU generation, propagation,







and aging analysis. These ML models will be trained using data produced by SPICE and RTL simulations (Tasks 1.2 and 1.3), enabling reliable predictions for larger, more complex designs where full simulation is not feasible.

To improve the accuracy of SET propagation modeling, the UPSET STA engine will incorporate advanced timing delay calculation models, including support for the Composite Current Source (CCS) timing model. CCS enhances timing modeling by using current waveform LUTs, simulating standard cell behavior while accounting for nonlinearities in the cell's response to different input slews and output loads. This results in more accurate estimates of delay and slew during SET propagation, improving the correlation with SPICE-level simulations. Additionally, the tool's interconnect modeling will be enhanced with Pi-model representations and full support for Distributed RC (SPEF), replacing the current simplified lumped RC networks. This enables more precise, post-layout parasitic-aware analysis.

Finally, accounting for On-Chip Variation (OCV) during SET analysis is crucial for accurate reliability evaluation in modern VLSI designs. OCV refers to small, unpredictable variations in process parameters (e.g., threshold voltage, channel length), supply voltage, and temperature across the same chip. These variations can significantly affect the delay characteristics of logic gates and interconnects, altering SET propagation behavior. For instance, a gate impacted by local variation may exhibit slower or faster switching, which can either mask or amplify the effect of an SET pulse. Ignoring OCV could lead to incorrect assumptions about timing, ultimately underestimating or overestimating a design's vulnerability to soft errors. Variation-aware timing models, such as Parametric OCV (POCV) and Advanced OCV (AOCV), will be incorporated into the STA engine to capture these spatial and temporal fluctuations. This will lead to more accurate fault propagation analysis and help identify vulnerable timing paths that require hardening, ultimately enhancing the design's reliability and safety margins—especially in mission-critical applications.

## 3.4 STA-based Optimizations

Following the fault analysis performed with the STA simulation engine, the design enters an optimization phase focused on improving its resilience to both transient and permanent faults. This phase incorporates a set of optimization techniques aimed at reducing the circuit's sensitivity to these faults, while also ensuring that Performance, Power, and Area (PPA) metrics are effectively balanced.

## 3.4.1 Optimization Techniques

Several techniques can be applied to improve the circuit's fault tolerance at the gate level. These techniques are selected based on their effectiveness in reducing SET sensitivity while considering the associated overheads in power, performance, and area. Some of the key techniques that will be investigated during the project to be integrated into the STA-based optimization flow include:

- **Pin Assignment/Rewiring**: This technique optimizes the assignment of pins to logic gates to improve power and performance. By taking advantage of the symmetry in the inputs of certain gates, it reduces the SET propagation by minimizing the impact of the fault on sensitive inputs.
- **Gate Resizing**: A commonly used method, gate resizing involves replacing smaller gates with larger ones to increase their capacitance. This reduces the likelihood of a particle strike causing significant voltage glitches while maintaining sufficient drive strength for SET propagation.







- Load Unbalancing: Inserting redundant logic to increase the output capacitance of a gate can help dissipate particle-induced charge more effectively, thus mitigating SET/SEU effects.
- Fan-out Decomposition: This technique involves duplicating gates to balance the load and reduce the susceptibility of gates to direct particle hits. Fan-out decomposition lowers the load of the target gate, decreasing its vulnerability to SETs, while considering the impact on interconnect delays.
- **Triple Modular Redundancy (TMR)**: For critical gates, TMR introduces redundancy by triplicating the gate and using a majority voter to filter out SET and SEU pulses. While this technique adds significant overhead, it serves as a last-resort optimization for highly sensitive areas.
- **SET Filter Insertion**: SET filters are used to reduce the pulse width of SETs by introducing delay lines or guard gates, effectively filtering out SETs below a certain threshold.
- Insertion of Cascaded Inverters: This technique helps in SET filtering by adding inverters in the logic path to reduce the SET pulse width. The difference in drive strength between cascaded inverters determines the filtering efficiency, though it also introduces power and delay overhead.
- **SET-driven Placement**: Optimization at the placement level can minimize SET sensitivity by placing critical gates in positions that increase interconnect lengths, which in turn helps mitigate SET pulse width and improve fault resilience. Additionally, spacing memory elements (flip-flop, latches) helps mitigate the probability of a particle strike affecting more than one member of the same TMR triplet.
- Insertion of Aging/TID Monitors: Aging monitors are used to track and monitor the long-term degradation effects in integrated circuits caused by aging mechanisms. This monitors can guide dynamic aging-aware optimizations, such as selective re-timing or voltage scaling, ensuring that the circuit continues to meet performance requirements as it ages without compromising reliability.

## 3.4.2 Closed-loop Optimization

The STA-based optimization process operates in an iterative, closed-loop manner, integrating the most suitable fault tolerance mechanisms based on the analysis of WP1.3. After applying the initial set of SET mitigation techniques, the tool re-evaluates the fault sensitivity of the design. If additional optimization is required, further techniques can be applied, with each iteration refining the design's robustness. The optimization process continues until the SET sensitivity is reduced to an acceptable level without introducing excessive PPA overhead. This closed-loop approach allows for efficient targeting of critical portions of the design, applying fault mitigation techniques selectively, and ensuring that each optimization step delivers meaningful improvements without unnecessary resource usage.

## 3.5 Benchmarking and Scalability Testing

A structured benchmarking and scalability testing approach will be implemented to validate the TWIN-RELECT EDA Tool Flow depicted in Figure 1. A diverse set of benchmark circuits will be utilized to







take into consideration the increasing complexity of modern ICs and analyze their vulnerability to transient and permanent faults caused by radiation effects. Therefore, evaluating the accuracy and scalability of the proposed tool across different technology nodes and architectures is essential.

The targeted benchmark circuits will be selected to cover a wide range of realistic circuits since we want to evaluate the detection and the modeling of radiation-induced disturbances conducted by the proposed EDA Tool Flow. These benchmarks will include:

- Circuits of various complexities, ranging from simple few-gate designs to large-scale architectures with over 100k gates.
- Various technology nodes, including Commercial Off-The-Shelf (COTS) ICs designed in sub-65nm technologies, selected by CNRS.
- A streamlined version of the industry-grade NVDLA deep learning accelerator, developed by MAN and IHP, will be provided as a benchmark for reliability analysis, with extended configuration options not available in the original open-source repository. Different RTL models will be generated from the configuration space, covering datapath bitwidths, buffer sizes, MAC array dimensions, adder tree depths, accumulator entries, and SDP throughput. These models will help analyze fault propagation from hardware to the application level and correlate reliability with accelerator configurations. The study will offer new insights into the performance-accuracy-reliability trade-off in deep learning accelerators.
  - sensitive subcomponents of the accelerator will be used for small-scale reliability analysis. On the datapath, focus will be on the MAC array, whose footprint makes it difficult to protect in a cost effective way. On the control path, the focus will be on the local control logic of the datapath components as well as on the global accelerator controller.
- Diverse processing architectures, including:
  - RISC-V multi-core platforms designed by IHP to evaluate the open-source processor fault tolerance.
  - Custom microcontrollers for space applications developed in the EU MORAL project, where the evaluation of sensitivity to radiation effects is vital.
  - Complex communication modules, such as baseband processors, are used to test fault effects in advanced data processing architectures.
- The TaBuLA asynchronous network-on-chip, developed by MAN, will be provided as a state-of-the-art benchmark for the reliability analysis of a sensitive component of emerging multicore neuromorphic processors for edge computing applications. It combines instantiation-time flexibility with silicon efficiency and hardware support for multicast, making it a representative benchmark for the requirements of neuromorphic communications. It will push the boundaries of the TWIN-RELECT tool flow, bringing its analysis and optimization capabilities within reach of asynchronous design.
  - Sensitive subcomponents for small-scale reliability analysis include the Mousetrap asynchronous pipeline, the asynchronous arbiters with balanced tree structures, and low-latency asynchronous FIFOs.













## 4. Collaboration and Coordination Among Partners

UTH members, as part of joint research activities, will take part in long-term training focusing on the impact of radiation effects on ICs. Both theoretical and hands-on training will be received during this period, covering issues associated with simulation methodologies and radiation experiments. Furthermore, the target of the irradiation testing, through the collaboration between CNRs and IHP, is to analyze the vulnerability of distinct architectures and technology nodes to transient and permanent faults caused by various radiation effects.

## 4.1 Joint Research Through Long-term Training of ESRs

The long-term training activities presented, in detail, in Deliverable D2.1 will be a key enabler in fostering the joint research between the UTH and the partner institutions (i.e., IHP, CNRS, and MAN). The on-site training of the UTH researchers in advanced research environments and their involvement with specialized training programs will ensure that research will extend beyond theoretical learning to practical applications, fully aligning with the goals of this project. At the same time, the collaboration between the partners and the coordination of the training activities should effectively allow for the seamless progress throughout the project's lifetime. To this end, regular meetings will be held to monitor the progress of the training activities while assessing their contribution to the overall objectives of the project.

During the training activities, each institution will leverage its unique expertise not only to enhance the technical skills of the UTH researchers but also to contribute essentially to advancing joint research initiatives and lay the foundation for a long-term research collaboration among partners. At IHP, hands-on training in fault characterization, selective fault tolerance, and AI-assisted fault analysis will provide UTH researchers with the necessary experience to develop novel fault-tolerant circuits. Also, their direct involvement in real testing experiments, like radiation, EMI, etc., will strengthen their practical skills and expertise fostering collaborative research and leading to high-impact publications and innovative methodologies for fault-tolerant systems. At CNRS, the deep engagement with the in-house reliability tools for radiation effects simulation and the design of experimental setups for reliability assessment will support cooperative research on new areas of semiconductor dependability. The acquired knowledge during these immersive internships at IHP and CNRS will be the foundation for joint research aimed at enhancing the resilience of modern ICs. As a complementary but equally significant aspect of the project objectives and research, the training in MAN will focus on fault tolerance of AI systems and neuromorphic computing. UTH researchers will be trained by experts in analyzing the fault effects in spiking neural networks and asynchronous interconnection networks, enhancing their research capacity in such an emerging field. The gained expertise will enable the contribution of UTH staff to joint research efforts on AI-driven fault-tolerant computing, a rapidly growing area with significant implications for next-generation computing technologies.

Overall, the long-term training at the partners institutions will ensure that the knowledge gained from UTH researchers will practically be applied to meet the objectives of the project. At the same time, these exchanges will enhance the potential for joint research leading to high-impact publications, development of innovative solutions, and paving the way for future joint proposals, and sustained partnerships between UTH and its international partners.









## 4.2 Joint Irradiation Experiments

Irradiation experiments will be conducted by exposing semiconductor devices to Protons, Heavy Ions, and Neutrons to assess and enhance the accuracy of the models designed. These experiments are essential to validate the IC reliability and AI-based systems efficiency in extreme radiation environment conditions, including avionics, space, and nuclear applications. The primary purpose of incorporating experiment data is to evaluate the effectiveness of the simulation methods utilized for fault modeling and mitigation techniques integrated into the developed TWIN-RELECT EDA Tool Flow (Figure 1).

#### Irradiation test campaigns:

- High-energy cosmic radiation will be simulated, utilizing **heavy ions and protons**. These particles correspond to space radiation environments and can cause SEEs, SELs, and multiple transient faults in electronic systems. Based on comprehensive data extracted from the results, the reliability of Commercial Off-The-Shelf (COTS) IC devices will be evaluated.
- One **Neutron irradiation** test campaign is forecast in an actual particle accelerator facility to emulate atmospheric radiation conditions. The reliability of critical applications, including automotive and avionics systems, is considerably affected by neutron-induced soft errors, as these disturbances can affect system functionality.
- **TID** testing will be performed, utilizing gamma/X-ray irradiation to model the prolonged and cumulative radiation effect in electronic systems. These experiments will evaluate and analyze the long-term degradation of circuit components, transistors, and memory cells, providing worthwhile insights to enhance circuit robustness in radiation environments.
- Experiments will be performed by exposing selected chips to increased supply voltages and elevated temperatures to model the impact of **accelerated aging** on ICs. The degradation in IC performance will be observed through the ongoing monitoring of device behavior. Furthermore, any lasting or long-term effects caused by stress conditions will be evaluated by post-experiment analysis. Therefore, beneficial information for IC robustness and lifespan will be provided, contributing to the design of reliable chips for integration into critical applications, including military, aerospace, and automotive industries.
- Aging and TID are critical factors in analyzing chip performance after prolonged radiation exposure. Ecorce can provide results on transistor threshold voltage shift effect due to continuous radiation exposure. This information is crucial for analyzing the performance of a chip after a certain period of exposure. Furthermore, it can be utilized to characterize different technologies and create new timing LUTs describing the performance of each standard cell after varying radiation exposure. The LUTs can be used for SET propagation by UPSET to evaluate the robustness of the design under TID effects. Finally, it would be interesting to load these threshold voltage shifts in PredicSEE to measure the change in the impact of ions on a gate after continuous radiation exposure.
- A **laser irradiation** experiment will be performed at IHP, using an in-house laser setup designed to simulate real ionizing radiation effects. This procedure provides a straightforward way to examine the impact of soft errors on the operation of selected test chips.
- The impact of conducted **electromagnetic interference (EMI)** on the reliability and performance of chips will be assessed by a comprehensive experiment. Specifically, conducted noise will be injected through the power supply cable to simulate real-world disturbances. The purpose of the experiment is to identify the threshold at which transient faults occur, evaluating their effects on







data integrity and chip functionality. The results will provide valuable data for designing robust ICs suited for EMI-prone environments across various sectors, such as telecommunications, automotive, aerospace, and medical devices.

In the context of TWIN-RELECT, CNRS and IHP will cooperate to perform experiments to study the impact of irradiation on semiconductor technologies. Both centers have extensive experience in designing fault-tolerant systems, utilizing the analysis provided by radiation testing. The radiation tests will leverage the methods described above to study the effects caused by various factors. Furthermore, radiation simulation tools ECORCE and PredicSEE will be used to predict radiation effects and optimize the methodologies employed to evaluate IC sensitivity. Through this collaboration, IHP and CNRS aim to provide a comprehensive evaluation of system strength and ensure the reliable performance of technologies under extreme conditions.







## 5. Publications Strategy

As part of the TWIN-RELECT project, a comprehensive publication strategy will be implemented to disseminate the findings and advancements made throughout the project. The goal is to share the results of our research with the broader scientific and engineering communities, particularly those working on reliability in electronic systems and fault tolerance. Publications will be targeted toward high-impact peer-reviewed journals and international conferences focused on reliability analysis, fault-tolerant design, and EDA tools.

## 5.1 Identified Key Research Topics & Trends

Several key research topics and emerging trends have been identified, which are central to the development of fault-tolerant systems and the enhancement of reliability in electronic designs. The following research topics represent the areas of greatest significance and interest among the scientific community of reliable electronics, forming the foundation for more resilient systems in both academic and industrial applications. These topics, which align with the project's research objectives, detailed earlier, will also shape the publications strategy of the project, reflecting the ongoing advancements in the field.

#### **Radiation Effects on Electronics:**

- Single Event Effects (SEE): SEU, SET, SEL, SEGR, SEB
- Total Ionizing Dose (TID) impact on semiconductor devices
- Displacement Damage (DD) in deep-space electronics
- Radiation Hardening Techniques (hardware and software)

#### Aging & Reliability Issues in Electronics:

- Bias Temperature Instability (BTI)
- Hot Carrier Injection (HCI)
- Electromigration (EM) in integrated circuits
- Time-Dependent Dielectric Breakdown (TDDB)
- Combined effects of radiation & aging in long-term electronic reliability

#### Reliability Analysis of Accelerators for ANNs and SNNs:

- Error geometries of output tensors in deep learning accelerators
- Accelerator datapath: correlating reliability analysis to model parameters and hardware configurations
- Accelerator control path: reliability analysis of accelerator control logic
- Robustness of the address-event representation routing function in neuromorphic processors
- Protection of bundled-data pipelines from transient faults
- Characterizing communication fault effects on spiking neural network simulation







## 5.2 Target High-Impact Journals & Conferences

#### Journals:

- IEEE Transactions on Nuclear Science (TNS)
- IEEE Transactions on Device and Materials Reliability (TDMR)
- IEEE Transactions on VLSI Systems (TVLSI)
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
- Microelectronics Reliability (Elsevier)
- Radiation Effects and Defects in Solids (Taylor & Francis)
- Journal of Electronic Materials (Springer)

#### **Conferences:**

- IEEE Nuclear and Space Radiation Effects Conference (NSREC)
- Radiation and Its Effects on Components and Systems (RADECS)
- IEEE International Reliability Physics Symposium (IRPS)
- European Conference on Radiation and Its Effects on Components and Systems (RADECS)
- International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA)
- Design Automation and Test in Europe Conference (DATE)
- Design Automation and Testing Conference (DAC)
- IFIP International Conference on Very Large Scale Integration (VLSI-SoC)
- IEEE International Symposium on Circuits and Systems (ISCAS)

#### **Open-Access & Preprint Platforms:**

- Share preprints on **arXiv**, **TechRxiv**, or **HAL** before formal publication.
- Consider open-access options for wider reach (e.g., IEEE Access, MDPI's Electronics).

#### Public Outreach & Knowledge Dissemination:

- Publish technical blogs on platforms like **Medium, ResearchGate, or LinkedIn**.
- Develop industry white papers for automotive, space, and medical electronics sectors.
- Organize workshops at international symposia.

As presented in *Deliverable D6.1: Dissemination, Communication and Exploitation Plan,* the TWIN-RELECT project prioritizes the dissemination of research findings through leading journals and prominent conferences in the field of reliability and robust electronic system design. The targeted venues outlined in *D6.1*, and the additional ones included in this deliverable, have been carefully selected to enhance the project's visibility and impact. Through these publications, the project aims to engage with the scientific community, fostering collaborations with experts in reliability modeling and contributing to advancements in the field.







# 6. Potential Risks

While the TWIN-RELECT project presents significant opportunities for advancing reliability analysis and fault tolerance in electronic systems, it also faces several potential risks that could impact its successful execution. These risks include technical challenges, logistical issues, and financial constraints. This section highlights the key risks identified, alongside proposed mitigation strategies, to ensure that the project can adapt to challenges and continue progressing towards its objectives.

## 6.1 Coping with Technical Risks

#### Simulation Limitations

To mitigate the limitations of simulation tools, multiple platforms should be used, and models must be validated with past irradiation data. Simplifying models can reduce computational load without sacrificing accuracy.

#### **Radiation Testing Challenges**

Preliminary low-dose tests should be conducted before full irradiation campaigns. Cross-checking results from various radiation sources ensures accuracy, and utilizing radiation-tolerant backup devices can prevent complete test failures.

#### **Reliability & Aging Factors**

Combining accelerated aging tests (HTRB, HTOL) with radiation exposure helps simulate real-world conditions. Parametric monitoring can detect early degradation trends, and machine learning models should be employed to predict reliability trends based on experimental data.

## 6.2 Coping with Logistical & Operational Risks

#### Equipment & Facility Risks

To ensure access to irradiation facilities, test slots should be booked early, and backup testing locations, such as university labs or international partnerships, should be maintained. Alternative testing methods, like laser attacks, can serve as substitutes when irradiation sources are unavailable.

#### Safety & Compliance Issues

Strict radiation handling protocols, including PPE, shielding, and controlled exposure times, must be enforced. Collaboration with radiation safety officers (RSOs) ensures regulatory compliance and enhances safety measures.

## 6.3 Coping with Financial & Resource Risks

To avoid financial risks, a contingency budget of 15-20% should be allocated for unforeseen expenses. Prioritizing critical tests and simulations optimizes resource allocation, ensuring that key objectives are met without excessive spending.







# 7. Conclusions

The TWIN-RELECT project is set to advance the field of reliability analysis and fault tolerance in modern electronic systems, with a strong focus on transient and permanent fault analysis and mitigation. This report outlines a comprehensive research framework for reliability-aware circuit design, integrating fault modeling, analysis, and optimization into a unified EDA tool flow (TWIN-RELECT EDA Tool Flow, Figure 1). In particular, by leveraging advanced fault modeling, STA-based analysis, and fault mitigation strategies, the project aims to enhance the reliability of modern, complex circuits while providing performance efficiency.

Key aspects of the research include:

- Assessing radiation effects across multiple abstraction levels, from the physical layer to circuit and system levels.
- Developing accurate models for various fault types, including soft errors, aging, TID effects, and interactions among multiple faults.
- Utilizing Static Timing Analysis (STA) and AI-driven methods to accelerate fault detection and propagation studies.
- Integrating fault-tolerance mechanisms to enhance circuit reliability while balancing performance, power consumption, and area overhead.
- Evaluating fault models and mitigation techniques utilizing data obtained from irradiation experiments.

The general impact of the TWIN-RELECT project, aside from its technical objectives, is to contribute to the advancement of reliable electronics through several key initiatives. These include boosting collaboration between academic and industrial partners, encouraging knowledge sharing through publications in prestigious journals and conferences, and fostering research activities. Also, the long-term training of UTH members in collaboration with other project partners is a significant endeavor since the valuable knowledge gained will be effectively integrated into the project research. TWIN-RELECT's ultimate purpose is to make a significant contribution to the field of reliability-aware electronic systems, enriching not only the academic body of knowledge but also providing practical solutions for industries.



